AITopics | transition point

Collaborating Authors

transition point

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Phase Transitions and Cyclic Phenomena in Bandits with Switching Constraints

David Simchi-Levi, Yunzong Xu

Neural Information Processing SystemsFeb-13-2026, 14:16:13 GMT

Neural Information Processing Systems http://nips.cc/

bandit problem, budget, bwsc problem, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.32)

Add feedback

Predicting the Formation of Induction Heads

Aoyama, Tatsuya, Wilcox, Ethan Gotlieb, Schneider, Nathan

arXiv.org Artificial IntelligenceNov-24-2025

Arguably, specialized attention heads dubbed induction heads (IHs) underlie the remarkable in-context learning (ICL) capabilities of modern language models (LMs); yet, a precise characterization of their formation remains unclear. In this study, we investigate the relationship between statistical properties of training data (for both natural and synthetic data) and IH formation. We show that (1) a simple equation combining batch size and context size predicts the point at which IHs form; (2) surface bigram repetition frequency and reliability strongly affect the formation of IHs, and we find a precise Pareto frontier in terms of these two values; and (3) local dependency with high bigram repetition frequency and reliability is sufficient for IH formation, but when the frequency and reliability are low, categoriality and the shape of the marginal distribution matter.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.16893

Country: Europe > Austria (0.28)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.47)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback

Phase Transitions between Accuracy Regimes in L2 regularized Deep Neural Networks

Ersoy, Ibrahim Talha, Wiesner, Karoline

arXiv.org Artificial IntelligenceAug-29-2025

Increasing the L2 regularization of Deep Neural Networks (DNNs) causes a first-order phase transition into the under-parametrized phase -- the so-called onset-of learning. We explain this transition via the scalar (Ricci) curvature of the error landscape. We predict new transition points as the data complexity is increased and, in accordance with the theory of phase transitions, the existence of hysteresis effects. We confirm both predictions numerically. Our results provide a natural explanation of the recently discovered phenomenon of '\emph{grokking}' as DNN models getting stuck in a local minimum of the error surface, corresponding to a lower accuracy phase. Our work paves the way for new probing methods of the intrinsic structure of DNNs in and beyond the L2 context.

artificial intelligence, machine learning, transition, (17 more...)

arXiv.org Artificial Intelligence

2505.06597

Country: Europe > Germany (0.14)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

DeepAtlas: a tool for effective manifold learning

Hughes, Serena, Hamilton, Timothy, Kolokotrones, Tom, Deeds, Eric J.

arXiv.org Artificial IntelligenceAug-28-2025

Manifold learning builds on the "manifold hypothesis," which posits that data in high-dimensional datasets are drawn from lower-dimensional manifolds. Current tools generate global embeddings of data, rather than the local maps used to define manifolds mathematically. These tools also cannot assess whether the manifold hypothesis holds true for a dataset. Here, we describe DeepAtlas, an algorithm that generates lower-dimensional representations of the data's local neighborhoods, then trains deep neural networks that map between these local embeddings and the original data. Topological distortion is used to determine whether a dataset is drawn from a manifold and, if so, its dimensionality. Application to test datasets indicates that DeepAtlas can successfully learn manifold structures. Interestingly, many real datasets, including single-cell RNA-sequencing, do not conform to the manifold hypothesis. In cases where data is drawn from a manifold, DeepAtlas builds a model that can be used generatively and promises to allow the application of powerful tools from differential geometry to a variety of datasets.

artificial intelligence, machine learning, manifold, (17 more...)

arXiv.org Artificial Intelligence

2508.19479

Country:

North America > United States > New York (0.28)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (1.00)

Industry:

Education (0.72)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Phase Transitions and Cyclic Phenomena in Bandits with Switching Constraints

David Simchi-Levi, Yunzong Xu

Neural Information Processing SystemsAug-19-2025, 23:22:17 GMT

MAB problem, the learner (i.e., decision-maker) is allowed to switch freely between actions, and

bandit problem, budget, bwsc problem, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.32)

Add feedback

MoCap-Impute: A Comprehensive Benchmark and Comparative Analysis of Imputation Methods for IMU-based Motion Capture Data

Bekhit, Mahmoud, Salah, Ahmad, Alrawahi, Ahmed Salim, Attia, Tarek, Ali, Ahmed, Eldesokey, Esraa, Fathalla, Ahmed

arXiv.org Artificial IntelligenceJul-15-2025

Motion capture (MoCap) data from wearable Inertial Measurement Units (IMUs) is vital for applications in sports science, but its utility is often compromised by missing data. Despite numerous imputation techniques, a systematic performance evaluation for IMU-derived MoCap time-series data is lacking. We address this gap by conducting a comprehensive comparative analysis of statistical, machine learning, and deep learning imputation methods. Our evaluation considers three distinct contexts: univariate time-series, multivariate across subjects, and multivariate across kinematic angles. To facilitate this benchmark, we introduce the first publicly available MoCap dataset designed specifically for imputation, featuring data from 53 karate practitioners. We simulate three controlled missingness mechanisms: missing completely at random (MCAR), block missingness, and a novel value-dependent pattern at signal transition points. Our experiments, conducted on 39 kinematic variables across all subjects, reveal that multivariate imputation frameworks consistently outperform univariate approaches, particularly for complex missingness. For instance, multivariate methods achieve up to a 50% mean absolute error reduction (MAE from 10.8 to 5.8) compared to univariate techniques for transition point missingness. Advanced models like Generative Adversarial Imputation Networks (GAIN) and Iterative Imputers demonstrate the highest accuracy in these challenging scenarios. This work provides a critical baseline for future research and offers practical recommendations for improving the integrity and robustness of Mo-Cap data analysis.

data quality, imputation, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2507.10334

Country:

Africa > Middle East > Egypt (0.68)
Asia > Middle East (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Leisure & Entertainment > Sports (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

When can in-context learning generalize out of task distribution?

Goddard, Chase, Smith, Lindsay M., Ngampruetikorn, Vudtiwat, Schwab, David J.

arXiv.org Machine LearningJun-9-2025

In-context learning (ICL) is a remarkable capability of pretrained transformers that allows models to generalize to unseen tasks after seeing only a few examples. We investigate empirically the conditions necessary on the pretraining distribution for ICL to emerge and generalize \emph{out-of-distribution}. Previous work has focused on the number of distinct tasks necessary in the pretraining dataset. Here, we use a different notion of task diversity to study the emergence of ICL in transformers trained on linear functions. We find that as task diversity increases, transformers undergo a transition from a specialized solution, which exhibits ICL only within the pretraining task distribution, to a solution which generalizes out of distribution to the entire task space. We also investigate the nature of the solutions learned by the transformer on both sides of the transition, and observe similar transitions in nonlinear regression problems. We construct a phase diagram to characterize how our concept of task diversity interacts with the number of pretraining tasks. In addition, we explore how factors such as the depth of the model and the dimensionality of the regression problem influence the transition.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2506.05574

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > Canada (0.04)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

NeUQI: Near-Optimal Uniform Quantization Parameter Initialization

Lin, Li, Hu, Xinyu, Wan, Xiaojun

arXiv.org Artificial IntelligenceMay-28-2025

Large language models (LLMs) achieve impressive performance across domains but face significant challenges when deployed on consumer-grade GPUs or personal devices such as laptops, due to high memory consumption and inference costs. Post-training quantization (PTQ) of LLMs offers a promising solution that reduces their memory footprint and decoding latency. In practice, PTQ with uniform quantization representation is favored for its efficiency and ease of deployment since uniform quantization is widely supported by mainstream hardware and software libraries. Recent studies on $\geq 2$-bit uniform quantization have led to noticeable improvements in post-quantization model performance; however, they primarily focus on quantization methodologies, while the initialization of quantization parameters is underexplored and still relies on the suboptimal Min-Max strategies. In this work, we propose NeUQI, a method devoted to efficiently determining near-optimal initial parameters for uniform quantization. NeUQI is orthogonal to prior quantization methodologies and can seamlessly integrate with them. The experiments with the LLaMA and Qwen families on various tasks demonstrate that our NeUQI consistently outperforms existing methods. Furthermore, when combined with a lightweight distillation strategy, NeUQI can achieve superior performance to PV-tuning, a much more resource-intensive approach.

large language model, machine learning, quantization, (19 more...)

arXiv.org Artificial Intelligence

2505.17595

Country: North America > United States > California (0.28)

Genre: Research Report > Promising Solution (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Small Models, Smarter Learning: The Power of Joint Task Training

Both, Csaba, Hoover, Benjamin, Strobelt, Hendrik, Krotov, Dmitry, Weidele, Daniel Karl I., Martino, Mauro, Dehmamy, Nima

arXiv.org Artificial IntelligenceMay-27-2025

The ability of a model to learn a task depends strongly on both the task difficulty and the model size. We aim to understand how task difficulty relates to the minimum number of parameters required for learning specific tasks in small transformer models. Our study focuses on the ListOps dataset, which consists of nested mathematical operations. We gradually increase task difficulty by introducing new operations or combinations of operations into the training data. We observe that sum modulo n is the hardest to learn. Curiously, when combined with other operations such as maximum and median, the sum operation becomes easier to learn and requires fewer parameters. We show that joint training not only improves performance but also leads to qualitatively different model behavior. We show evidence that models trained only on SUM might be memorizing and fail to capture the number structure in the embeddings. In contrast, models trained on a mixture of SUM and other operations exhibit number-like representations in the embedding space, and a strong ability to distinguish parity. Furthermore, the SUM-only model relies more heavily on its feedforward layers, while the jointly trained model activates the attention mechanism more. Finally, we show that learning pure SUM can be induced in models below the learning threshold of pure SUM, by pretraining them on MAX+MED. Our findings indicate that emergent abilities in language models depend not only on model size, but also the training curriculum.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.18369

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A biconvex method for minimum-time motion planning through sequences of convex sets

Marcucci, Tobia, Halm, Mathew, Yang, Will, Lee, Dongchan, Marchese, Andrew D.

arXiv.org Artificial IntelligenceApr-29-2025

--We consider the problem of designing a smooth trajectory that traverses a sequence of convex sets in minimum time, while satisfying given velocity and acceleration constraints. This problem is naturally formulated as a nonconvex program. T o solve it, we propose a biconvex method that quickly produces an initial trajectory and iteratively refines it by solving two convex subproblems in alternation. This method is guaranteed to converge, returns a feasible trajectory even if stopped early, and does not require the selection of any line-search or trust-region parameter . Exhaustive experiments show that our method finds high-quality trajectories in a fraction of the time of state-of-the-art solvers for nonconvex optimization. In addition, it achieves runtimes comparable to industry-standard waypoint-based motion planners, while consistently designing lower-duration trajectories than existing optimization-based planners. Selecting the most effective motion-planning algorithm for a robotic system often requires balancing three competing objectives: reliability, computational efficiency, and trajectory quality. Consider Sparrow, the robot arm in Figure 1 that sorts individual products into bins before they get packaged in the Amazon warehouses. The algorithms that move Sparrow must be extremely reliable, as these robots handle millions of diverse products every day, and each failure requires expensive interventions. They must be efficient, since every millisecond spent planning is taken away from other crucial computations, and limits the robot reactivity to sensor observations. Finally, they should generate trajectories that push the robot to its physical limits, so that the work-cell throughput is maximized and the hardware is fully utilized. Unfortunately, general-purpose methods for motion planning do not excel in all of these areas at once. Sampling-based methods like PRM [18], RRT [19], and their asymptotically optimal versions [17] can be fast enough for real-time applications. They are also reliable in low-dimensional spaces, where dense sampling is computationally feasible. However, they become significantly less effective as the space dimension grows. Additionally, although their kino-dynamic variants support differential constraints [20, 16, 22], sampling-based methods remain considerably less practical for designing smooth continuous trajectories than producing polygonal paths. Trajectory-optimization methods based on nonconvex programming [1, 33] scale well to high-dimensional spaces and explicitly factor in the robot kinematics and dynamics. Over the years, these techniques have become significantly faster [39, 13] and, with the advent of specialized GPU implementations [35], they are now even viable for real-time motion planning.

artificial intelligence, constraint, trajectory, (18 more...)

arXiv.org Artificial Intelligence

2504.18978

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)

Add feedback